Prepared dataset of 6,547,439 monthly records of 359,300 properties was aggregated to combined SA4 & GCC levels. All SA4s from five metropolitan areas were aggregated and remaining SA4s were used “as is”.
There are 50 SA4_GCC areas in Austrlia.
Locations of Airbnb properties (points) were linked to ABS data of SA areas (polygons) from 2016. Spatial join in ArcGIS was used with CLOSEST option to capture locations that did not overlap with polygons (see example here https://www.airbnb.com/rooms/19103554 - due to privacy reasons, locations are not exact).
All SA4_GCC areas had Airbnb inside, but few regions are missing monthly observations on revenue.
## # A tibble: 1,464 x 3
## SA4_GCC_NAME16 reporting_month revenue
## <chr> <date> <int>
## 1 Adelaide - GCC 2014-08-01 0
## 2 Adelaide - GCC 2014-09-01 0
## 3 Adelaide - GCC 2014-10-01 0
## 4 Adelaide - GCC 2014-11-01 0
## 5 Adelaide - GCC 2014-12-01 0
## 6 Adelaide - GCC 2015-01-01 0
## 7 Adelaide - GCC 2015-02-01 0
## 8 Adelaide - GCC 2015-03-01 0
## 9 Adelaide - GCC 2015-04-01 0
## 10 Adelaide - GCC 2015-05-01 0
## # ... with 1,454 more rows
Time series was filled with zeros for such cases.
Monthly revenue with 2,750 data points of cumulative Airbnb revenue for 50 areas:
Relative revenue was calculated for 2018 data:
This is the view across regions:
See Genolini et al. (2015) at https://www.jstatsoft.org/article/view/v065i04
No clear evidence for optimal amount of clusters
## cluster4geometry n percent
## A 25 0.50
## B 15 0.30
## C 7 0.14
## D 3 0.06
## cluster6geometry n percent
## A 16 0.32
## B 12 0.24
## C 9 0.18
## D 6 0.12
## E 5 0.10
## F 2 0.04
Using 4 clusters solution